What a Nerd! Beating Students and Vector Cosine in the ESL and TOEFL Datasets
نویسندگان
چکیده
In this paper, we claim that Vector Cosine – which is generally considered one of the most efficient unsupervised measures for identifying word similarity in Vector Space Models – can be outperformed by a completely unsupervised measure that evaluates the extent of the intersection among the most associated contexts of two target words, weighting such intersection according to the rank of the shared contexts in the dependency ranked lists. This claim comes from the hypothesis that similar words do not simply occur in similar contexts, but they share a larger portion of their most relevant contexts compared to other related words. To prove it, we describe and evaluate APSyn, a variant of Average Precision that – independently of the adopted parameters – outperforms the Vector Cosine and the co-occurrence on the ESL and TOEFL test sets. In the best setting, APSyn reaches 0.73 accuracy on the ESL dataset and 0.70 accuracy in the TOEFL dataset, beating therefore the non-English US college applicants (whose average, as reported in the literature, is 64.50%) and several state-of-the-art approaches.
منابع مشابه
A Comparative Study of Class Activities and Students’ Expectations of IELTS and TOEFL iBT Preparation Courses: A Methodological Triangulation Washback Study
Washback refers to the influence of a test on teaching and learning. This study was an attempt to compare the influence of IELTS and TOEFL iBT on the expectations the students brought to their courses and to investigate how these expectations were fulfilled. To this end, 100 IELTS and 120 TOEFL iBT students attending preparation courses took a questionnaire survey, and a sample of their ten cla...
متن کاملThe Effect of Role-Play and Simulation Approach on Enhancing ESL Oral Communication Skills
This study investigated the effect of role-play and simulation approach on Malaysian Polytechnic engineering students’ ESL oral communication skills. In addition, the study examined the students’ perceptions of the effect of the role-play and simulation on their oral communication skills. A mixed method design was employed, using both quantitative and qualitative data collection app...
متن کاملMotivation, amount of interaction, length of residence, and ESL learners’ pragmatic competence
This study examined how motivation for learning English, the amount of contact with English, and length of residence in the target language area affects Korean graduate students’ English pragmatic skills. The study attempted to account for differential pragmatic development among 50 graduate-level Korean students in relation to individual factors mentioned above. The data were...
متن کاملMining the Web for Synonyms: PMI-IR versus LSA on TOEFL
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Fo...
متن کاملTowards a Reappraisal of Literary Competence within the Confines of ESL/EFL Classroom
The present paper aimed at highlighting the judicious incorporation of literary genres (i.e. novel, short story/fiction, drama, and poetry) as a supposedly inspiring teaching technique and an allegedly potent learning resource into ESL/EFL curricula. The rationale behind this pedagogical inclusion is to promote both teaching and learning effectiveness through capitalizing intensively on the gen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1603.08701 شماره
صفحات -
تاریخ انتشار 2016